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Streptomyces olindensis DAUFPE 5622, which was isolated from a Brazilian soil sample, produces the antitumor anthracycline 
cosmomycin D. The genome sequence is 9.4 Mb in length, with a G+C content of 71%. Thirty-four putative secondary metabo- 
lite biosynthetic gene clusters were identified, including the cosmomycin D cluster. 
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Streptomyces olindensis DAUFPE 5622 was isolated in the 1960s 
from a Brazilian soil sample. It produces the anthracycline 
cosmomycin D, which has antitumor activity and has attracted 
interest because of its distinctive glycosylation pattern (1-3). The 
first goal of the genome sequencing project was to determine 
the complete sequence of the cosmomycin biosynthetic cluster; 
the DNA sequence of part of the cluster was described earlier (1). 
A further goal was to identify additional secondary metabolite 
clusters in order to help isolate further interesting secondary me- 
tabolites from the strain. The genome sequences of Streptomyces 
strains often contain 20 to 30 potential secondary metabolite clus- 
ters, most of which do not correspond to known products of the 
strain (4). Although they have been called cryptic or silenced clus- 
ters, it is likely that they are expressed under appropriate physio- 
logical conditions. A data mining approach based on genome se- 
quences is a promising route for isolating novel biologically active 
compounds. 

The genome sequence was obtained using 454 pyrosequencing 
technology. The 650,799 reads obtained correspond to a coverage 
of about 29 X. The assembly contains 120 contigs, with a total 
length of 9 .4 Mb and a G + C content of 7 1 % . The mean size of the 
contigs is 78 kb, and the N 50 is 246 kb. The genome sequence 
should contain one linear plasmid of 76 kb and, if it is like other 
Streptomyces genomes, a single linear chromosome (5). 

Glimmer (6) was used to determine potential protein-coding 
genes, which were annotated using the Rapid Annotations using 
Subsystems Technology (RAST) server (7). The genome was an- 
notated using the NCBI Prokaryotic Genome Annotation Pipeline 
version 2.0 (8) (http://www.ncbi.nlm.nih.gov/). There were 8,287 
predicted protein-coding genes and 4 rRNA gene clusters. Sixty- 
four tRNA sequences were predicted using the tRNAscan-SE Web 
server (9). Thirty-four secondary metabolite clusters were pre- 
dicted using antiSMASH (10), including 7 polyketide synthases 
(PKS), 3 nonribosomal peptide synthetases (NRPS), 3 hybrid 
clusters (2 PKS-NRPS; 1 NRPS-terpene), 1 aminoglycoside, 3 sid- 



erophores, 3 bacteriocins, 3 butyrolactones, 5 terpenes, 1 ectoine, 
and 5 clusters of other hypothetical functions. The cosmomycin 
cluster is about 40 kb long and contains 42 predicted genes, whose 
functions will be determined by further experimental studies. The 
sequence of the genome of S. olindensis will assist in the develop- 
ment of genetic engineering strategies for the production of new 
compounds with biotechnological potential. 

Nucleotide sequence accession numbers. This whole-genome 
shotgun project has been deposited at DDBJ/EMBL/GenBank un- 
der the accession no. JJOH00000000. The version described in this 
paper is version JJOH01000000. 
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